Regularized Minimum Error Rate Training

نویسندگان

Michel Galley

Chris Quirk

Colin Cherry

Kristina Toutanova

چکیده

Minimum Error Rate Training (MERT) remains one of the preferred methods for tuning linear parameters in machine translation systems, yet it faces significant issues. First, MERT is an unregularized learner and is therefore prone to overfitting. Second, it is commonly used on a noisy, non-convex loss function that becomes more difficult to optimize as the number of parameters increases. To address these issues, we study the addition of a regularization term to the MERT objective function. Since standard regularizers such as `2 are inapplicable to MERT due to the scale invariance of its objective function, we turn to two regularizers—`0 and a modification of `2— and present methods for efficiently integrating them during search. To improve search in large parameter spaces, we also present a new direction finding algorithm that uses the gradient of expected BLEU to orient MERT’s exact line searches. Experiments with up to 3600 features show that these extensions of MERT yield results comparable to PRO, a learner often used with large feature sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition

In this paper, we present robust feature extractors that incorporate a regularized minimum variance distortionless response (RMVDR) spectrum estimator instead of the discrete Fourier transform-based direct spectrum estimator, used in many front-ends including the conventional MFCC, to estimate the speech power spectrum. Direct spectrum estimators, e.g., single tapered periodogram, have high var...

متن کامل

A Distributed Learning Method for ℓ1-Regularized Kernel Machine over Wireless Sensor Networks

In wireless sensor networks, centralized learning methods have very high communication costs and energy consumption. These are caused by the need to transmit scattered training examples from various sensor nodes to the central fusion center where a classifier or a regression machine is trained. To reduce the communication cost, a distributed learning method for a kernel machine that incorporate...

متن کامل

Block-Regularized m × 2 Cross-Validated Estimator of the Generalization Error

A cross-validation method based on [Formula: see text] replications of two-fold cross validation is called an [Formula: see text] cross validation. An [Formula: see text] cross validation is used in estimating the generalization error and comparing of algorithms' performance in machine learning. However, the variance of the estimator of the generalization error in [Formula: see text] cross vali...

متن کامل

Estimation for Intermittent Wireless Connections

Broadband wireless channels observed at a receiver cannot fully exhibit dense nature in a low to moderate signalto-noise ratio (SNR) regime, if the channels follow a typical propagation scenario such as Vehicular-A or Pedestrian-B. It is hence expected that l1 regularized channel estimation methods can improve channel estimation performance in the broadband wireless channels. However, it is wel...

متن کامل

Improved performance and generalization of minimum classification error training for continuous speech recognition

Discriminative training of hidden Markov models (HMMs) using segmental minimum classi cation error (MCE) training has been shown to work extremely well for certain speech recognition applications. It is, however, somewhat prone to overspecialization. This study investigates various techniques which improve performance and generalization of the MCE algorithm. Improvements of up to 7% in relative...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Regularized Minimum Error Rate Training

نویسندگان

چکیده

منابع مشابه

Regularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition

A Distributed Learning Method for ℓ1-Regularized Kernel Machine over Wireless Sensor Networks

Block-Regularized m × 2 Cross-Validated Estimator of the Generalization Error

Estimation for Intermittent Wireless Connections

Improved performance and generalization of minimum classification error training for continuous speech recognition

عنوان ژورنال:

اشتراک گذاری